Deterministic wavelet thresholding for general-error metrics

ABSTRACT

Novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing maximum-error metrics are provided. An optimal low polynomial-time algorithm for one-dimensional wavelet thresholding based on a new dynamic-programming (DP) formulation is provided that can be employed to minimize the maximum relative or absolute error in the data reconstruction. Directly extending a one-dimensional DP algorithm to multi-dimensional wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, novel, polynomial-time approximation schemes (with tunable approximation guarantees for the target maximum-error metric) for deterministic wavelet thresholding in multiple dimensions are also provided.

CROSS-REFERENCES TO RELATED APPLICATIONS

This application is a divisional of pending U.S. patent application Ser.No. 11/152,842, filed Jun. 13, 2005, entitled “DETERMINISTIC WAVELETTHRESHOLDING FOR GENERAL-ERROR METRICS,” (Attorney Docket No.Garofalakis 24-8), which is incorporated herein by reference in itsentirety.

FIELD OF THE INVENTION

The present invention relates generally to the field of wavelet synopsesand, in particular, relates to computationally-efficient schemes fordeterministic maximum-error wavelet thresholding in one and multipledimensions.

BACKGROUND OF THE INVENTION

Several studies have demonstrated the effectiveness of the waveletdecomposition as a tool for reducing large amounts of data down tocompact wavelet synopses that can be used to obtain fast, accurateapproximate answers to user queries. While conventional wavelet synopsesare based on greedily minimizing the overall root-mean-squared (i.e.,L₂-norm) error in the data approximation, recent work has demonstratedthat such synopses can suffer from important problems, including severebias and wide variance in the quality of the data reconstruction, andlack of non-trivial guarantees for individual approximate answers. As aresult, probabilistic thresholding schemes have been recently proposedas a means of building wavelet synopses that try to probabilisticallycontrol other approximation-error metric, such as the maximum relativeerror in data-value reconstruction, which is arguably the most importantfor approximate query answers and meaning full error guarantees.

One of the main open problems posed by this earlier work is whether itis possible to design efficient deterministic wavelet-thresholdingalgorithms for minimizing general, non-L₂ error metrics that arerelevant to approximate query processing systems, such as maximumrelative or maximum absolute error. Such algorithms can guarantee betterwavelet synopses and avoid the pitfalls of probabilistic techniques(e.g., bad coin-flip sequences) leading to poor solutions.

SUMMARY

Various deficiencies of the prior art are addressed by embodiments ofthe present invention including novel, computationally efficient schemesfor deterministic wavelet thresholding with the objective of optimizinggeneral approximation-error metrics. Embodiments include an optimal lowpolynomial-time algorithm for one-dimensional wavelet thresholding basedon a dynamic-programming (DP) formulation that can be employed tominimize the maximum relative or absolute error in the datareconstruction. Directly extending the one-dimensional DP algorithm tomulti-dimensional wavelets results in a super-exponential increase intime complexity with the data dimensionality. Thus, embodiments alsoinclude novel, polynomial-time approximation schemes (with tunableapproximation guarantees for the target maximum-error metric) fordeterministic wavelet thresholding in multiple dimensions. Theseembodiments for optimal and approximate thresholding algorithms formaximum error are extended to handle a broad, natural class ofdistributive error metrics, which includes several error measures, suchas mean weighted relative error and weighted L_(P)-norm error.Experimental results on real-world and synthetic data sets demonstratethe effectiveness of the embodiments for these extensions.

One embodiment is a method for deterministic wavelet thresholding foroptimizing maximum-error metrics. A deterministic wavelet synopsis isbuilt that minimizes maximum error in a data-value approximation. Aminimum maximum error for all data values in a subtree for thedeterministic wavelet synopsis is provided.

Another embodiment is another method for deterministic waveletthresholding for optimizing maximum-error metrics. A minimum error valuefor an error subtree is conditioned not only on a root node of the errorsubtree and the amount of synopsis storage allotted, but also on anerror that enters the subtree through coefficient selections made on apath from the root node to a particular node, excluding that particularnode itself. All possible coefficient selections are tabulated inpolynomial time.

Yet another embodiment is another method for deterministic waveletthresholding for optimizing maximum error metrics. A range of allpossible error contributions for paths up to a root node of an errorsubtree are approximately covered using a plurality of approximate errorvalues. The approximate error values are tabulated by capturingapproximate error in a plurality of error subtrees. Each error subtreeis rooted at a particular node. The rounded additive error contributiondue to ancestors of the particular node being discarded from a synopsisis tabulated.

Still another embodiment is another method for deterministic waveletthresholding for optimizing maximum error metrics. Each coefficient isreplaced with a scaled-down coefficient value in an error tree formulti-dimensional wavelet thresholding. Dynamic programming is performedover the error tree. A table is built for each node in the error tree.The table holds a minimum maximum absolute error in a subtree for thatnode and an error contribution due to ancestors of that node. Allcoefficients in a set of coefficients in a synopsis that are greaterthan a predefined constant are retained. An error approximation iscalculated and provided.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present invention can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1A depicts the error tree for a simple example data vector;

FIG. 1B depicts the support regions and signs of the sixteennonstandard, two-dimensional Haar coefficients in the correspondinglocations of a 4×4 wavelet-coefficient array W_(A);

FIG. 2 depicts the error tree structure for the two-dimensional 4×4 Haarcoefficient array in FIG. 1B;

FIG. 3 is a block diagram showing the input and output for an exemplarymethod of optimal deterministic thresholding for maximum error in onedimension (MinMaxErr);

FIGS. 4A, 4B, and 4C are flowcharts describing the exemplary methodMinMaxErr of FIG. 3;

FIG. 5 is a block diagram showing input and output for an exemplaryembodiment of an ε-additive error approximation scheme forabsolute/relative error minimization for deterministic multi-dimensionalwavelet thresholding (AddMultIDErr);

FIGS. 6A and 6B are flowcharts describing the exemplary embodiment ofthe scheme AddMultIDErr of FIG. 5;

FIG. 7 is a block diagram showing input and output for an exemplaryembodiment of a (1+ε) approximation scheme for absolute errorminimization for deterministic multi-dimensional wavelet thresholding(ApproxMultIDErr);

FIGS. 8A, 8B, and 8C are flowcharts describing the exemplary embodimentof the scheme ApproxMultIDErr of FIG. 7; and

FIG. 9 is a high level block diagram showing a computer.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe figures.

DETAILED DESCRIPTION OF THE INVENTION

The invention will be primarily described within the general context ofan embodiment of deterministic wavelet thresholding for maximum-errormetrics. However, those skilled in the art and informed by the teachingsherein will realize that the invention is applicable to databasemanagement, systems query processing, discrete mathematics,combinatorics, combinatorial algorithms, signal processing, imageprocessing, performance, data synopses, wavelets, approximate queryprocessing, and many other different kinds of data processing.

Approximate query processing over precomputed data synopses has emergedas a cost-effective approach for dealing with the huge data volumes, thehigh query complexities, and the increasingly stringent response-timerequirements that characterize today's decision support systems (DSS)applications. Typically, DSS users pose very complex queries to theunderlying database management system (DBMS) that require complexoperations over gigabytes or terabytes of disk-resident data and, thus,take a very long time to execute to completion and produce exactanswers. Due to the exploratory nature of many DSS applications, thereare a number of scenarios in which an exact answer may not be required,and a user may in fact prefer a fast, approximate answer. For example,during a drill-down query sequence in ad-hoc data mining, initialqueries in the sequence frequently have the sole purpose of determiningthe truly interesting queries and regions of the database. Providing(reasonably accurate) approximate answers to these initial queries givesuser the ability to focus their explorations quickly and effectively,without consuming inordinate amounts of valuable system resources. Anapproximate answer can also provide useful feedback on how well-posed aquery is, allowing DSS users to make an informed decision on whetherthey would like to invest more time and resources to execute their queryto completion. Moreover, approximate answers obtained from appropriatesynopses of the data may be the only available option when the base datais remote and unavailable. Finally, for DSS queries requesting anumerical answer (e.g., total revenues or annual percentage) it is oftenthe case that the full precision of the exact answer is not needed andthe first few digits of precision will suffice (e.g., the leading fewdigits of a total in the millions or the nearest percentile of apercentage.

Wavelets provide a mathematical tool for the hierarchical decompositionof functions, with a long history of successful applications in signaland image processing. Recent studies have also demonstrated theapplicability of wavelets to selectivity estimation and to approximatequery processing over massive relational tables and data streams.Briefly, the idea is to apply wavelet decomposition to the inputrelation (attribute column(s) or online analytical processing (OLAP)cube) to obtain a compact data synopsis that comprises a select smallcollection of wavelet coefficients. Some experimental results havedemonstrated that fast and accurate approximate query processing enginescan be designed to operate solely over such compact wavelet synopses.

A major shortcoming of conventional wavelet-based techniques forapproximate query processing (including all the above-cited studies) isthe fact that the quality of approximate answers can vary widely and nomeaningful error guarantees can be provided to the users of theapproximate query engine. Coefficients in conventional wavelet synopsesare typically chosen to optimize the overall root-mean-squared (i.e.,L₂-norm average) error in the data approximation, which, as demonstratedin a recent study, can result in wide variance as well as severe bias inthe quality of the approximation over the underlying domain of datavalues. One proposed solution, termed probabilistic wavelet synopses,relies on probabilistic-thresholding schemes (based on randomizedrounding) for synopsis construction that try to probabilisticallycontrol other approximation-error metrics, such as the maximum relativeerror in the data reconstruction. Such maximum relative-error metricsare arguably the most important for effective approximate queryprocessing and can provide meaningful error guarantees for individualapproximate answers. In order to probabilistically control maximumrelative error, some algorithms previously proposed explicitly try tominimize appropriate probabilistic metrics (such as normalized standarderror or normalized bias) for the randomized synopsis constructionprocess. Similar schemes are also given for controlling maximum absoluteerror.

A potential problem with some probabilistic-thresholding techniques ofthe past is that, exactly due to their probabilistic nature, there isalways a possibility of a bad sequence of coin flips resulting in a poorsynopsis; furthermore, they are based on a quantization of the possiblesynopsis-space allotments, whose impact on the quality of the finalsynopsis is not entirely clear. A deterministic thresholding algorithmthat explicitly minimizes the relevant maximum-error metric (e.g.,maximum relative error) in the synopsis, is always guaranteed to givebetter results. Unfortunately, their thresholding algorithms dependcritically on the probabilistic nature of their solution, and areinapplicable in a deterministic setting. An additional shortcoming ofthe probabilistic thresholding schemes is that they explicitly targetonly maximum-error metrics; it is not at all clear if or how they can beextended to other useful error measures for approximate query answers,such as, for example, mean relative error. In fact, one of the main openproblems is whether it is possible to design efficient deterministicthresholding for minimizing non-L₂ error metrics that are relevant forapproximate query answers, such as the maximum relative or absoluteerror in the data approximation.

Embodiments of the present invention include novel, computationallyefficient schemes for deterministic wavelet thresholding with anobjective of optimizing a general class of error metrics (e.g., maximumor mean relative error). An optimal low polynomial-time algorithm forone-dimensional wavelet thresholding is provided. This algorithm isbased on a novel dynamic-programming (DP) formulation and can beemployed to minimize either maximum relative error or maximum absoluteerror in the data reconstruction—its running time and working-spacerequirements are only O(N²) and O(N min{B, log N}) respectively, where Ndenotes the size of the data domain and B is the desired size of thesynopsis (i.e., number of retained coefficients). Directly extending theoptimal DP algorithm to multidimensional wavelets results in asuper-exponential increase in time complexity with the datadimensionality rendering such a solution unusable even for therelatively small dimensionalities where wavelets are typically used(e.g., 2-5 dimensions). Thus, two embodiments of efficientpolynomial-time approximation schemes (with tunable ε-approximationguarantees for the target maximum-error metric) for deterministicwavelet thresholding in multiple dimensions are provided. Bothembodiments of the approximation schemes are based on approximatedynamic programs that tabulate a much smaller number of sub-problemsthan the optimal DP solution, while guaranteeing a small deviation fromthe optimal objective value. More specifically, one embodiment of theapproximation algorithm gives ε-additive-error guarantees for maximumrelative or absolute error, whereas the other embodiment is a(1+ε)-approximation scheme for maximum absolute error—the running timefor both embodiments of the approximation schemes is roughlyproportional to O(1/ε N log² NB log B). Various embodiments of thepresent invention include efficient optimal and near-optimal algorithmsfor building wavelet synopses optimized for maximum-error metrics in oneor multiple dimensions. These can be extended to handle a broad, naturalclass of distributive error metrics. This class includes several usefulerror measures for approximate query answers, such as mean weightedrelative error and weighted L_(p)-norm error,

Wavelet Decomposition and Wavelet Data Synopses

Wavelets are a useful mathematical tool for hierarchically decomposingfunctions in ways that are both efficient and theoretically sound.Broadly speaking, the wavelet decomposition of a function consists of acoarse overall approximation together with detail coefficients thatinfluence the function at various scales. The wavelet decomposition hasexcellent energy compaction and de-correlation properties, which can beused to effectively generate compact representations that exploit thestructure of data. Furthermore, wavelet transforms can generally becomputed in linear time.

One-Dimensional Haar Wavelets

Given the one-dimensional data vector A containing the N=8 data valuesA=[2, 2, 0, 2, 3, 5, 4, 4], the Haar wavelet transform of A is computedas follows. First, average the values together pairwise to get a newlower resolution representation of the data with the following averagevalues [2, 1, 4, 4]. In other words, the average of the first two values(i.e., 2 and 2) is 2 that of the next two values (i.e., 0 and 2) is 1,and so on. Some information is lost in this averaging process. To beable to restore the original values of the data array, some detailcoefficients are stored to capture the missing information. In Haarwavelets, these detail coefficients are simply the differences of the(second of the) averaged values from the computed pairwise average.Thus, in this simple example, for the first pair of averaged values, thedetail coefficient is 0 since 2−2=0, for the second −1 is stored againsince 1−2=−1. Note that no information has been lost in this process—itis fairly simple to reconstruct the eight values of the original dataarray from the lower-resolution array containing the four averages andthe four detail coefficients. Recursively applying the above pairwiseaveraging and differencing process on the lower-resolution arraycontaining the averages, we get the following full decomposition:

Resolution Averages Detail Coefficients 3 [2, 2, 0, 2, 3, 5, 4, 4] — 2[2, 1, 4, 4] [0, −1, −1, 0] 1 [3/2, 4] [1/2, 0] 0 [11/4] [−5/4]

The wavelet transform (also Known as the wavelet decomposition) of A isthe single coefficient representing the overall average of the datavalues followed by the detail coefficients in the order of increasingresolution. Thus, the one-dimensional Haar wavelet transform of A isgiven by W_(A)=[11/4, −5/4, 1/2, 0, 0, −1, −1, 0]. Each entry in W_(A)is called a wavelet coefficient. The main advantage of using W_(A)instead of the original data vector A is that for vectors containingsimilar values most of the detail coefficients tend to have very smallvalues. Thus, eliminating such small coefficients from the wavelettransform (i.e., treating them as zeros) introduces only small errorswhen reconstructing the original data, resulting in a very effectiveform of lossy data compression.

Note that, intuitively, wavelet coefficients carry different weightswith respect to their importance in rebuilding the original data values.For example, the overall average is more important than any detailcoefficient since it affects the reconstruction of all entries in thedata array. In order to equalize the importance of all waveletcoefficients, the final entries of W_(A) are normalized appropriately. Acommon normalization scheme is to divide each wavelet coefficient by√{square root over (2^(l))}, where l denotes the level of resolution atwhich the coefficient appears (with l=0 corresponding to the coarsestresolution level). Thus, the normalized coefficient c*_(i) is c*_(i)^(*)/√{square root over (2^(level(c) ^(i) ⁾)}.

Basic Haar Wavelet Properties and Notational Conventions

A helpful tool for exploring and understanding the key properties of theHaar wavelet decomposition is the error tree structure. The error treeis a hierarchical structure built based on the wavelet transform process(even though it is primarily used as a conceptual tool, an error treecan be easily constructed in linear O(N) time).

FIG. 1A depicts the error tree for the simple example data vector A.Each internal node c_(i) (i=0, . . . , 7) is associated with a waveletcoefficient value and each leaf d_(i) (i=0, . . . , 7) is associatedwith a value in the original data array; in both cases, the index idenotes the positions in the (data or wavelet transform) array. Forexample, c₀ corresponds to the overall average of A. Note that thevalues associated with the error tree nodes c_(j) are the unnormalizedcoefficient values; the resolution levels l for the coefficients(corresponding to levels in the tree) are also depicted in FIG. 1A. Theterms node, coefficient, and node/coefficient value are usedinterchangeably in what follows, even though they have distinctmeanings. For ease of reference, Table 1 below summarizes some thenotation for some symbols with brief descriptions of its semantics.Detailed definitions of all these parameters are provided at appropriatelocations in this description. For simplicity, the notation assumesone-dimensional wavelets—extensions to multi-dimensional wavelets arestraightforward. Additional notation is introduced when necessary.

TABLE 1 Notation Symbol i∈ {0 . . . N − 1} Description N Number ofdata-array cells D Data-array dimensionality B Space budget for synopsisA, W_(A) Input data and wavelet transform arrays d_(i) Data value fori^(th) data-array cell {circumflex over (d)}_(i) Reconstructed datavalue for i^(th) array cell c_(i) Haar coefficient at index/coordinate iT_(i) Error subtree rooted at node c_(i) coeff(T_(i)), data(T_(i))Coefficient/data values in T_(i) subtree path(u) All non-zero properancestors of node u in the error tree s Sanity bound for relative-errormetric relErr_(i), absErr_(i) Relative/absolute error for data valued_(i)

Given a node u in an error tree T, let path(u) denote the set of allproper ancestors of u in T (i.e., the nodes on the path from u to theroot of T, including the root but not u) with non-zero coefficients. Akey property of the Haar wavelet decomposition is that thereconstruction of any data value d_(i) depends only on the values ofcoefficients on path(d_(i)); more specifically,

$\begin{matrix}{d_{i} = {\sum\limits_{{cj} \in {{path}{(d_{i})}}}{{sign}_{ij} \cdot c_{j}}}} & (1)\end{matrix}$

where sign_(ij)=+1 if d_(i) is in the left child subtree of cj or j=0,and sign_(ij)=−1 otherwise. Thus, reconstructing any data value involvessumming at most log N+1 coefficient. For example, in FIG. 1A,

$d_{4} = {{c_{0} - c_{1} + c_{6\;}} = {{\frac{11}{4} - \left( {- \frac{5}{4}} \right) + \left( {- 1} \right)} = 3.}}$

The support region for a coefficient c_(i) is defined as the set of(contiguous) data values that c_(i) is used to reconstruct; the supportregion for a coefficient c_(i) is uniquely identified by its coordinatei.

Multi-Dimensional Haar Wavelets

The Haar wavelet decomposition can be extended to multi-dimensional dataarrays using two distinct known methods, namely the standard andnonstandard Haar decomposition. Each of these transforms results from anatural generalization of the one-dimensional decomposition processdescribed above, and both have been used in a wide variety ofapplications, including approximate query answering overhigh-dimensional DSS data sets.

As in the one-dimensional case, the Haar decomposition of aD-dimensional data array A results in a D-dimensional waveletcoefficient array W_(A) with the same dimension ranges and number ofentries. Consider a D-dimensional wavelet coefficient W in the (standardor nonstandard) wavelet-coefficient array W_(A). W contributes to thereconstruction of a D-dimensional rectangular region of cells in theoriginal data array A (i.e., W's support region). Further, the sign ofW's contribution (+W or −W) can vary along the quadrants of W's supportregion in A.

As an example, FIG. 1B depicts the support regions and signs of thesixteen nonstandard, two-dimensional Haar coefficients in thecorresponding locations of a 4×4 wavelet-coefficient array W_(A). Theblank areas for each coefficient correspond to regions of A whosereconstruction is independent of the coefficient, i.e., thecoefficient's contribution is 0. Thus, W_(A)[0,0] is the overall averagethat contributes positively (i.e., +W_(A)[0,0]) to the reconstruction ofall values in A, whereas W_(A)[3, 3] is a detail coefficient thatcontributes (with the signs shown in FIG. 1B) only to values in A'supper right quadrant. Each data cell in A can be accuratelyreconstructed by adding up the contributions (with the appropriatesigns) of those coefficients whose support regions include the cell.FIG. 1B also depicts the two levels of resolution (l=0,1) for theexample two-dimensional Haar coefficients; as in the one-dimensionalcase, these levels define the appropriate constants for normalizingcoefficient values.

Error-tree structures for multi-dimensional Haar wavelets can beconstructed (once again in linear O(N) time) in a manner similar tothose for the one-dimensional case, but their semantics and structureare somewhat more complex. A major difference is that, in aD-dimensional error tree, each node (except for the root, i.e., theoverall average) actually corresponds to a set of 2^(D)−1 waveletcoefficients that have the same support region but different quadrantsigns and magnitudes for their contribution. Furthermore, each(non-root) node t in a D-dimensional error tree has 2^(D) childrencorresponding to the quadrants of the (common) support region of allcoefficients in t². The number of children (coefficients) for aninternal error-tree node can actually be less than 2^(D) (resp.,2^(D)−1) when the sizes of the data dimensions are not all equal. Inthese situations, the exponent for 2 is determined by the number ofdimensions that are active at the current level of the decomposition(i.e., those dimensions that are still being recursively split byaveraging/differencing. The sign of each coefficient's contribution tothe leaf (data) values residing at each of its children in the tree isdetermined by the coefficient's quadrant sign information.

As an example, FIG. 2 depicts the error-tree structure for thetwo-dimensional 4×4 Haar coefficient array in FIG. 1B. Thus, the(single) child t of the root node contains the coefficients W_(A)[0,1],W_(A)[1,0], and W_(A)[1,1], and has four children corresponding to thefour 2×2 quadrants of the array; the child corresponding to thelower-left quadrant contains the coefficients W_(A)[0,2], W_(A)[2,0],and W_(A)[2,2], and all coefficients in t contribute with a plus sign(+) to all values in this quadrant.

Based on the above generalization of the error-tree structure tomultiple dimensions, the formula for data-value reconstruction (Equation(1)) can naturally be extended to multi-dimensional Haar wavelets. Onceagain, the reconstruction of d_(i) depends only on the coefficient setsfor all error-tree nodes in path(d_(i)), where the sign of thecontribution for each coefficient W in node tεpath(d_(i)) is determinedby the quadrant sign information for W.

Wavelet-Based Data Reduction: Coefficient Thresholding

Given a limited amount of storage for building a wavelet synopsis of theinput data array A, a thresholding procedure retains a certain numberB<<N of the coefficients in W_(A) as a highly-compressed approximaterepresentation of the original data (i.e., the remaining coefficientsare implicitly set to 0). The goal of coefficient thresholding is todetermine the best subset of B coefficients to retain, so that someoverall error measure in the approximation is minimized. The method ofchoice for the vast majority of earlier studies on wavelet-based datareduction and approximation is conventional coefficient thresholdingthat greedily retinas the B largest Haar-wavelet coefficients inabsolute normalized value. It is a well-known fact that thisthresholding method is in fact provably optimal with respect tominimizing the overall root-mean-squared error (i.e., L₂-norm averageerror) in the data compression.

More formally, letting {circumflex over (d)}_(i) denote the(approximate) reconstructed data value for cell i, retaining the Blargest normalized coefficients implies that the resulting synopsisminimizes the quantity

$\sqrt{\frac{1}{N}{\sum\limits_{i}\left( {d_{i} - {\hat{d}}_{i}} \right)^{2}}}$

(for the given amount of space B).

Unfortunately, wavelet synopses optimized for overall L₂ error may notalways be the best choice for approximate query processing systems. Asobserved in a recent study, such conventional wavelet synopses sufferfrom several important problems, including the introduction of severebias in the data reconstruction and wide variance in the quality of thedata approximation, as well as the lack of non-trivial guarantees forindividual approximate answers. To address these shortcomings, theirwork introduces novel, probabilistic thresholding schemes based onrandomized rounding, that probabilistically round coefficients either upto a larger rounding value (to be retained in the synopsis) or down tozero. Intuitively, their probabilistic schemes assign each non-zerocoefficient fractional storage yε(0,1] equal to its retentionprobability, and then flip independent, appropriately-biased coins toconstruct the synopsis. Existing strategies are based on trying toprobabilistically control the maximum relative error (with anappropriate sanity bound s) in the approximation of individual datavalues based on the synopsis; that is, they attempt to minimize thequantity

${\max\limits_{i}\left\{ \frac{{{\hat{d}}_{i} - d_{i}}}{\max \left\{ {{d_{i}},s} \right\}} \right\}},$

(where {circumflex over (d)}_(i) denotes the data value reconstructedbased on the synopsis) with sufficiently high probability. Morespecifically, the existing algorithms are based on dynamic-programming(DP) formulations that explicitly minimize appropriate probabilisticmetrics, such as the maximum normalized standard error (MinRelVar) orthe maximum normalized bias (MinRelBias), in the randomized synopsisconstruction. These formulations are then combined with a quantizationof the potential fractional-storage allotments to give combinatorialthresholding algorithms. Similar probabilistic schemes are also givenfor probabilistically controlling maximum absolute error.

Building and Using Wavelet Synopses in a DBMS

The construction of wavelet synopses typically takes place during astatistics collection process, whose goal is to create concisestatistical approximations for the value distributions of eitherindividual attributes or combinations of attributes in the relations ofa DBMS. Statistics collection is usually an off-line process, carriedout during night-time or other system-idling periods. Once created, suchstatistical summaries are typically stored as part of the DBMS-cataloginformation. More specifically, a wavelet synopsis comprising B waveletcoefficients is stored as a collection of B pairs {<i_(j),c_(ij)>:j=1, .. . , B} where i_(j) and c_(ij) denote the index and value(respectively) of the j^(th) retained synopsis coefficient.

Wavelet synopsis information in the DBMS catalog can be exploited forseveral different purposes. The primary (and, more conventional) use ofsuch summaries is as a tool for enabling effective (compile-time)estimates of the result sizes of relational operators for the purpose ofcost-based query optimization. For instance, estimating the selectivityof a range predicate like l≦V≦h is equivalent to estimating the rangesummation f_(x)(l:h)=Σ_(i=l) ^(h)f_(x)[i] where f_(x) is thefrequency-distribution array for attribute X. It is not difficult to seethat, given a B-coefficient synopsis of the f_(x) array, computingf_(x)(l:h) only involves coefficients in and, thus, can be estimated bysumming only min{B,2 log N+1} synopsis coefficients. A B-coefficientwavelet synopsis can also be easily expanded (in O(B) time) into anO(B)-bucket histogram (i.e., piecewise-constant) approximation of theunderlying data distribution with several possible uses (e.g., as a datavisualization/approximation tool). Finally, wavelet-coefficient synopsescan enable very fast and accurate approximate query answers duringinteractive data-exploration sessions (e.g., initial, drill-down querysequences in data-mining tasks). In fact, as efficient approximate queryprocessing algorithms for complex select-project-join relational queriescan be designed to operate entirely in the wavelet-coefficient domain.

Deterministic Wavelet Thresholding

Rather than trying to probabilistically control maximum relative errorthrough the optimization of probabilistic measures (like normalizedstandard error or bias), a more direct solution is to designdeterministic thresholding schemes that explicitly minimizemaximum-error metrics. Such schemes can not only guarantee bettersynopses, but they also avoid the potential pitfalls of randomizedtechniques, as well as the space-quantization requirement of some knownmethods whose impact on the quality of the final solution is notentirely clear. Unfortunately, the DP formations and algorithms dependcrucially on the ability to assign fractional storage (i.e., retentionprobability) values in (0,1] to individual coefficients, which renderstheir schemes inapplicable for deterministic wavelet thresholding (wherethe storage assigned to each coefficient is either 0 or 1). In fact, oneof the main open problems is whether it is possible to design efficientalgorithms for deterministic Haar-coefficient thresholding forminimizing non-L₂ error metrics that are relevant for approximate queryanswers, such as maximum relative or absolute error in the dataapproximation. Embodiments of the present invention attack this problemfor both one- and multi-dimensional Haar wavelets. Unlike theprobabilistic schemes, embodiments of the present invention areapplicable to a general class of approximation-error metrics, including,for example, mean weighted relative error.

One-Dimensional Wavelet Thresholding

One exemplary embodiment is a novel, simple, low polynomial-time schemebased on Dynamic-Programming (DP) for building deterministicHaar-wavelet synopses that minimize the maximum relative or absoluteerror in the data-value approximation. More formally, let

${relErr}_{i} = \left\{ \frac{{{\hat{d}}_{i} - d_{i}}}{\max \left\{ {{d_{i}},S} \right\}} \right\}$

be an error metric that combines relative error with an appropriatesanity bound s and, similarly, let absErr_(i)=|{circumflex over(d)}_(i)−d_(i)| denote the absolute approximation error for the ith datavalue. The deterministic wavelet thresholding problem is formallydefined as follows.

Deterministic Maximum Relative/Absolute Error Minimization

Given a synopsis space budget B, determine a subset of at most BHaar-wavelet coefficients that minimize the maximum relative (or,absolute) error in the data-value approximation; that is, forerrε{relErr, absErr} minimize max_(iε{0, . . . , N-1}){err_(i)}.

Note that unlike the probabilistic normalized standard error ornormalized bias metrics employed in known works, embodiments of thepresent invention including relErr and absErr metrics do not have asimple monotonic/additive structure over the Haar-coefficient errortree. The key problem, of course, is that, even though data values aresimple linear combinations of coefficients, each coefficient contributeswith a different sign on different parts of the underlying data domain.This also implies that the DP formulations are no longer applicable,since the assumed principle of optimality for error subtrees is nolonger valid. In other words, due to the absence of additive/monotonicstructure for the objective, the optimal solution for the subtree rootedat node c_(j) is not necessarily a combination of the optimal (partial)solutions for c_(j)'s child subtrees.

Exemplary embodiments of a DP recurrence and algorithm for adeterministic maximum-error optimization problem condition the optimalerror value for an error subtree not only on the root node c_(c) of thesubtree and the amount B of synopsis storage allotted, but also on theerror that enters that subtree through the coefficient selections madeon the path from the root to node c_(j) (excluding c_(j) itself), i.e.,coefficient selections on path (c_(j)). Since the depth of the errortree is O(log N), all such possible selections can be tabulated whilekeeping the running-time of the algorithm in the low-polynomial range.

More formally, let B denote the total space budget for the synopsis, andlet T_(j) be the subtree of the error-tree rooted at node c_(j), withcoeff(T_(j))(data(T_(j))) denoting the set of coefficient (resp., data)values in T_(j). Finally, let M[j,b,S] denote the optimal (i.e.,minimum) value of the maximum error (relative or absolute) among alldata values in Tj assuming a synopsis space budget of b coefficients forthe Tj subtree, and that a subset S⊂path(c_(j)) (of size at mostmin{B−b, log N+1}) of proper ancestors of cj have been selected for thesynopsis; that is, assuming a relative-error metric (i.e., err=relErr),

${{M\left\lbrack {j,b,S} \right\rbrack} = {\min\limits_{S_{j} \subseteq {{{{coeff}{(T_{j})}} \cdot {S_{j}}} \leq b}}\left\{ {\max\limits_{d_{i} \in {{data}{(T_{j})}}}{relErr}_{i}} \right\}}},{where}$${relErr}_{i} = {\frac{{d_{i} - {\sum_{{ck} \in {{{path}{(d_{i})}}\bigcap{({S_{j}\bigcup S})}}}{{sign}_{ik} \cdot c_{k}}}}}{\max \left\{ {{d_{i}},S} \right\}}.}$

The case for absolute error (i.e., err=absErr) is defined similarly. ADP recurrence for computing the M[j,b,S] entries is formulated asfollows. M[0,B,φ] gives the desired optimal error value at the root nodeof the error tree. The corresponding error-optimal wavelet synopsis isthen built by simply retracing the choices of the DP computations usingknown techniques.

The base case for the recurrence occurs for data (i.e., leaf) nodes inthe Haar error tree; that is, for c_(j)=d_(j−N) with j≧N. (See FIG. 1A).In this case, M[j,b,S] is not defined for b>0, whereas for b=0 and foreach subset S⊂path(d_(j−N)) (of size≦min{B, log N+1}) define

${{M\left\lbrack {j,0,S} \right\rbrack} = \frac{{d_{j} - N - {\sum_{{ck} \in S}{{sign}_{{j - N},k} \cdot c_{k}}}}}{r}},$

where r=max{|d_(j−N)|,S} for err=relErr, and r=1 for err=absErr.

In the case of an internal error-tree node c_(j) with j<N, the DPalgorithm has two distinct choices when computing M[j,b,S], namelyeither drop coefficient c_(j) or keep it in the final synopsis. If wechoose to drop c_(j) from the synopsis, then it is easy to see that themaximum error from cj's two child subtrees (i.e., c_(2j) and c_(2j+1))will be propagated upward; thus, the minimum possible maximum errorM[j,b,S] for Tj in this case is simply

$\begin{matrix}{\min\limits_{0 \leq b^{\prime} \leq b}{\max {\left\{ {{M\left\lbrack {2_{j},b^{\prime},S} \right\rbrack},{M\left\lbrack {{2_{j} + 1},{b - b^{\prime}},S} \right\rbrack}} \right\}.}}} & (2)\end{matrix}$

In the above recurrence, S contains proper ancestors of cj that areclearly proper ancestors of c_(2j) and c_(2j+1) as well; thus, theright-hand side of the recurrence is well-defined as the DP computationproceeds bottom-up to reach cj. If, on the other hand, c_(j) is kept inthe synopsis (assuming, of course, b≧1), the least possible errorM[j,b,S] for Tj is computed as

$\begin{matrix}{\min\limits_{0 \leq b^{\prime} \leq {b - 1}}{\max {\left\{ {{M\left\lbrack {2_{j},b^{\prime},{S\bigcup\left\{ c_{j} \right\}}} \right\rbrack},{M\left\lbrack {{2_{j} + 1},{b - b^{\prime}},{S\bigcup\left\{ c_{j} \right\}}} \right\rbrack}} \right\}.}}} & (3)\end{matrix}$

Again, the right-hand side of the recurrence is well-defined. The finalvalue for M[j,b,S] defined as the minimum of the two possible choicesfor coefficient cj, i.e., equations (2) and (3) above. A pseudo-codedescription of an exemplary embodiment of an optimal DP algorithm fordeterministic maximum-error thresholding in one dimension (termedMinMaxErr) is shown Table 2 below.

TABLE 2 An exemplary embodiment of the MinMaxErr Algorithm: OptimalDeterministic Thresholding for Maximum Error in One Dimension. procedureMinMaxErr(W_(A),B,root,S,err) Input: Array W_(A) = [c₀, . . . , c_(N-1)]of N Haar wavelet coefficients, space budget B (number of retainedcoefficients), error-subtree root-node index root, subset of retainedancestors of root node S ⊂ path(root), target maximum error metric err.Output: Value of M[root,B,S] according to the optimal dynamic program(M[root,B,S],value), decision made for the root node(M[root,B,S].retained), and space allotted to left child subtree(M[root,B,S].leftAllot). (The last two are used for retracing theoptimal solution to build the synopsis.) begin 1. if(M[root,B,S].computed = true) then 2. return M[root,B,S].value  //optimal value already in M[] 3. if (N ≦ root < 2N) then  //leaf/datanode 4. if(B = 0)then 5. M[root,B,S].value := |d_(j - N)-Σ_(ckεS)sign_(j - N, k) · c_(k)| 6. if (err = relErr) then 7.${{M\left\lbrack {{root},B,S} \right\rbrack}.{value}}:=\frac{{M\left\lbrack {{root},B,S} \right\rbrack}.{value}}{\max \left\{ {{d_{j - N},S}} \right\}}$8. endif 9. else 10. M[root,B,S].value := ∞ 11. for b := 0 to B step 1do  // first choice; drop root 12. left :=MinMaxErr(W_(A),B-b,2*root,S,err) 13. right := MinMaxErr(W_(A),B -b,2*root + 1,S,err) 14. if (max{left,right} < M[root,B,S].value) then15. M[root,B,S].value := max{left,right} 16. M[root,B,S].retained :=false 17. M[root,B,S].leftAllot := b 18. endif 19. endfor 20. for b := 0to B - 1 step 1 do  // second choice; keep root 21. left :=MinMaxErr(W_(A),b,2 * root,S ∪ {root},err) 22. right :=MinMaxErr(W_(A),B - b - 1,2 * root + 1 ,S ∪ {root},err) 23. if(max{left,right} < M[root,B,S].value) then 24. M[root,B,S].value :=max{left,right} 25. M[root,B,S].retained := true 26.M[root,B,S].leftAllot := b 27. endif 28. endfor 29. endif 30.M[root,B,S].computed := true 31. return(M[root,B,S].value) end

FIG. 3 shows the input and output for the exemplary method of optimaldeterministic thresholding for maximum error in one dimension(MinMaxErr) 300, for which pseudo code is shown in Table 2 above. Theinput includes a wavelet array (W_(A)) 302, a number of retainedcoefficients (B) 304, subtree root index (j) 306, and a subset ofretained coefficients (S) 308 on a path from root to j 306. The outputincludes an array M[j,B,S], which is an optimal (minimum) maximumrelative error for all data values in the j-subtree for a synopsis sizeB, assuming coefficients in S are retained above j.

FIGS. 4A-4C are flowcharts describing the exemplary method (MinMaxErr)300 of FIG. 3. In this exemplary method, procedure MinMaxErr starts at400 and determines whether M[j,B,S] has already been computed at 402. Ifso, M[j,B,S] is returned at 404; otherwise it is determined whether j isa leaf node at 406. If so, M[j,B,S] is computed using a formula or setto be undefined if b is zero and returned at 408; otherwise, set OPT toinfinity and b to 0 at 410.

Looping over b from zero to B 412, 414, 416, left is set to the resultof calling MinMaxErr 400 with one set of variables, while right is setto the result of calling MinMaxErr 400 with a different set of variablesat 418. Thus, the procedure is called recursively within the loop. Next,it is determined whether the maximum of left and right are less than OPTat 420. If so, then OPT is set to the maximum of left and right at 422and the loop continues until b is not less than B at 416.

Then, b is initialized to zero at 424 outside another loop over b fromzero to B−1 426, 428, 430. Again, left is set to the result of callingMinMaxErr 400 with one set of variables, while right is set to theresult of calling MinMaxErr 400 with a different set of variables at432. Next, it is determined whether the maximum of left and right isless than OPT at 434. If so, then OPT is set to the maximum of left andright at 436 and the loop continues until b is not less than B at 430.

Time and Space Complexity

Given a node/coefficient c_(j) at level l of the error tree, theexemplary MinMaxErr algorithm considers at most B+1 space allotments tothe T_(j) subtree (where the “+1” accounts for the possibility of zerospace) and at most 2^(l+1) subsets of ancestors of c_(j). Thus, thenumber of entries in the DP array M[] that need to be computed for c_(j)is O(B2^(l+1)). Furthermore, the time needed to compute each such entryis O(log B). To see this, note that for any fixed node k and ancestorsubset S, M[k,b′,S] is a decreasing function of the space allotment b′.Thus, the optimal distribution point b′ in equations (2) and (3) (Steps11-28 in MinMaxErr) is computed using an O(log B)-time binary searchprocedure, looking for the space allotment where the error values forthe two child subtrees are equal or the adjacent pair of cross-overallotments. (To simplify the description, this binary-search procedureis not shown in the pseudo-code description in Table 2.) Clearly, thetotal number of error-tree nodes at level l is l(2^(l)) making theoverall time complexity of our DP algorithm

${O\left( {\sum\limits_{i = 0}^{{lo}\; g\; N}{2^{l}2^{l + 1}B\; \log \; B}} \right)} = {{O\left( {B\; \log \; B{\sum\limits_{i = 0}^{{lo}\; {gN}}2^{2l}}} \right)} = {{O\left( {N^{2}B\; \log \; B} \right)}.}}$

Given an internal node c_(j) at level l of the error tree and a synopsisbudget B, the number of coefficients that can be retained inside theT_(j) subtree is actually upper bounded by min{B,2^(log N−1)−1} (sinceT_(j) comprises at most 2^(ogN−1)−1 coefficient nodes. Thus, the numberof entries for c_(j) in the MP[] array is at most O(2^(l+1)min{B+1,2^(log N−1)})=O(min{B2^(l+1)+2N}), with each entry requiringO(min{log B, log N−1}) computation time. Summing across all error treenodes (as above) using these tighter bounds gives an overall timecomplexity of only O(N²) for the exemplary embodiment of the DPalgorithm.

With respect to the space requirements of exemplary embodiments of thescheme, it is easy to see that the overall size of the DP array MP[] is

${{O\left( {\sum\limits_{l = 0}^{{lo}\; {gN}}{2^{l}\min \left\{ {{B\; 2^{l + 1}},{2N}} \right\}}} \right)} \leq {O\left( {N\; {\sum\limits_{l = 0}^{{lo}\; {gN}}2^{l}}} \right)}} = {{O\left( N^{2} \right)}.}$

Note that, however, the exemplary MinMaxErr algorithm does not actuallyrequire the entire DP array to be memory-resident at all times. Forinstance, the results for all descendants of a node c_(j) are no longerneeded and can be swapped out of memory once the results for node c_(j)have been computed. With this small optimization, it is easy to see thatthe exemplary bottom-up DP computation never requires more than oneactive line (i.e., entries corresponding to a single tree node) of theM[] array per error-tree level, where the size of such a line for a nodeat level l is O(N min{13, log N}). Thus, the size of the memory-residentworking set for the exemplary DP algorithm drops to only O(Σ_(l=l)^(log N) min{B2^(l+1),2N})=O(N min{B, log N}).

In summary, for the one-dimensional case, the exemplary MinMaxErralgorithm is an optimal deterministic thresholding scheme for buildingone-dimensional wavelet synopses that minimizes the maximum relativeerror (or maximum absolute error) in the data approximation. MinMaxErrruns in time O(N²) and has a total-space (working-space) requirement ofO(N²) (resp., O(N min{B.log N})).

Multi-Dimensional Wavelet Thresholding

The deterministic wavelet thresholding problem becomes significantlymore complex for multi-dimensional wavelets, and directly extending theexemplary embodiment of the optimal one-dimensional DP formulation tothe case of multiple dimensions fails to give a practical solution. Asdescribed above, in the D-dimensional error-tree structure, even thoughthe tree depth remains O(log N), each node in the tree now contains upto 2^(D)−1 wavelet coefficients with the same support region anddifferent quadrant signs (FIG. 2). This implies that the total number ofpossible ancestor subsets S for a multi-dimensional coefficient at alevel l=Θ(log N) is O(2^(log N·(2) ^(D) ⁻¹⁾)=O(N² ^(D) ⁻¹), renderingthe exhaustive-enumeration DP scheme described above completelyimpractical, even for the relatively small data dimensionalities (i.e.,D=2-5) where wavelet-based data reduction is typically employed. It iswell known that, due to the dimensionality curse, wavelets and otherspace-partitioning schemes become ineffective above 5-6 dimensions.

There are two exemplary embodiments of efficient polynomial-timeapproximation schemes for deterministic multi-dimensional waveletthresholding for maximum-error metrics. Both the approximation schemesare based on approximate dynamic programs that explore a much smallernumber of options than the optimal DP formulation, while offeringtunable ε-approximation guarantees for the final target maximum-errormetric. More specifically, our first scheme can give ε-additive-errorguarantees for maximum relative or absolute error, whereas the secondscheme is a (1+ε)-approximation algorithm for maximum absolute error.

An ε-Additive-Error Approximation Scheme for Absolute/Relative ErrorMinimization

Intuitively, the optimal one-dimensional DP scheme is based onexhaustively enumerating, for each error subtree, all the possible errorvalues entering the subtree through the choices made on the path to thesubtree's root node. The first approximation scheme avoids thisexhaustive enumeration and, instead, tries to approximately cover therange of all possible error contributions for paths up to the root nodeof an error subtree using a much smaller number of (approximate) errorvalues. The exemplary embodiment of the approximation scheme is thenbased on a much sparser DP formulation, which only tabulates thissmaller set of error values. Specifically, let R denote the maximumabsolute coefficient value in the error tree and let ε<1 denote thedesired approximation factor. The additive contribution to the absolutedata-value reconstruction error from any possible path in the error treeis guaranteed to lie in the range

=[−R2^(D) log N,+R2^(D) log N]. The approximation scheme covers theentire

range using error-value breakpoints of the form ±(1+ε)^(k), for a rangeof (contiguous) integer values for the exponent k. Note that the numberof such breakpoints needed is essentially

$O\left( {{{\log_{1 + ɛ}\left( {2R\; 2^{D}\log \; N} \right)} \approx {O\left( \frac{D + {\log \; R} + {\log \; \log \; N}}{ɛ} \right)}},} \right.$

for small values of ε<1; in other words,

${k \in K} = {\left\{ {0,1,\ldots \mspace{14mu},{O\left( \frac{D + {\log \; R} + {\log \; \log \; N}}{ɛ} \right)}} \right\}.}$

Now, let round_(ε)(v) be a function that rounds any value vεR down tothe closest value in the set E={0}∪{±(1−ε)^(k),kεK}; that is, lettingl=log_(1+ε)|v|, we have round_(ε)(v)=(1+ε)^(l) if v≧1, (1+ε)^(l) if v≦1,and 0 otherwise. The DP array M^(a)□for the approximation schemetabulates the values M^(a)[j,b,e] capturing the approximate maximumerror (relative or absolute) in the T_(j) error subtree (rooted at nodej), assuming a space budget of b coefficients allotted to T_(j) and anapproximate/rounded additive error contribution of eεE due to properancestors of node j being discarded from the synopsis.

The base case for the computation of M^(a)[], i.e., the case of aleaf/data node j≧N in the error tree is fairly straightforward; onceagain, M^(a)[j,b,e] is only defined for b=0 and

${{M^{a}\left\lbrack {j,b,e} \right\rbrack} = \frac{e}{r}},$

where r is either max{|d_(j−N)|,S} or 1 depending on whether a relativeor absolute error metric is being targeted.

In the case of an internal error-tree node j, each node now correspondsto a set S(j) of at most 2^(D)−1 (non-zero) coefficients, and has atmost 2° child subtrees (with indices, say j₁, . . . , j_(m)). Assumethat a subset s⊂S(j) of node j's coefficients in the synopsis is chosento be maintained and, for each coefficient cεS(j), let sign(c,j) denotethe sign of c's contribution to the j_(i) child subtree; then, the leastpossible maximum error entries M^(a)[j,b,e], eεE are estimated for Tj(assuming, of course, that b≧|s|) as

$\min\limits_{0 \leq {b_{1} + \ldots + b_{m}} \leq {b - {s}}}{\max\limits_{1 \leq i \leq m}{\left\{ {M^{a}\left\lbrack {j_{i},b_{i},{{round}_{ɛ}\left( {e + {\sum\limits_{c \in {{S{(j)}} - s}}{{{sign}\left( {c,j_{i}} \right)} \cdot c}}} \right)}} \right\rbrack} \right\}.}}$

In other words, for a given selected coefficient subset s, all possibleallotments of the remaining b−|s| space to children of node j areconsidered with the rounded cumulative error that enters those childrentaking into account the contribution from the dropped coefficients inS(j)−s. To avoid the O(B² ^(D) ) factor in run-time complexity impliedby the search of all possible space allotments b₁, . . . , b_(m) tochild subtrees in the above recurrence, the search is simply orderedamong a node's children. The basic idea is to generalize the approximateDP-array entries to M^(a)[j,b,e], where j=(j₁, . . . , j_(k)) is a listof error-tree nodes and e=(e₁, . . . , e_(k)) is a list of incomingadditive errors corresponding to the nodes in j. The above recurrencefor M^(a)[(j),b(e)] then becomes simply

${{M^{a}\left\lbrack {(j),b,(e)} \right\rbrack} = {\min\limits_{s \subseteq {S{(j)}}}\left\{ {M^{a}\left\lbrack {\left( {j_{2},\ldots \mspace{14mu},j_{m}} \right),{b - {s}},\left( {e_{1},{\ldots \mspace{14mu} e_{m}}} \right)} \right\rbrack} \right\}}},{{{where}\mspace{14mu} e_{k}} = {{{{round}_{ɛ}\left( {e + {\sum\limits_{c \in {{S{(j)}} - s}}{{{sign}\left( {c,j_{k\;}} \right)} \cdot c}}} \right)}\mspace{14mu} {for}\mspace{14mu} k} = 1}},\ldots \mspace{14mu},{m\mspace{14mu} {and}}$${M^{a}\left\lbrack {\left( {j_{1},\ldots \mspace{14mu},j_{m}} \right),b^{\prime},\left( {e_{1},\ldots \mspace{14mu},e_{m}} \right)} \right\rbrack} = {\min\limits_{0 \leq b^{''} \leq b^{\prime}}{\max {\begin{Bmatrix}{{M^{a}\left\lbrack {\left( j_{1} \right),b^{''},\left( e_{1} \right)} \right\rbrack},} \\{M^{a}\begin{bmatrix}{\left( {j_{2},\ldots \mspace{14mu},j_{m}} \right),} \\\begin{matrix}{{b^{\prime} - b^{''}},} \\\left( {e_{2},\ldots \mspace{14mu},e_{m}} \right)\end{matrix}\end{bmatrix}}\end{Bmatrix}.}}}$

Intuitively, the first equation states that the approximate error value

M^(a)[(j),b,(e)] at node j is computed as the minimum (over all possiblechoices for the subset s⊂S(j) of retained coefficients at j) of thecorresponding approximate error values for the list of node j'schildren. The second equation then computes the approximate error valuesfor this list (j₁, . . . , j_(m)) by exploring all possible allotmentsof a given space budget b′ between the first child (j₁) and the listsuffix of all remaining children (j₂, . . . , j_(m)) of node j.

To see how this generalization affects the space complexity of theexemplary DP formulation, note that, assuming a given node j in theerror tree: (1) there are at most 2^(D) possible different values forthe list suffix j of child error-tree nodes of j; and (2) for each suchlist suffix j and each possible incoming error value eεE at node j,there are at most 2² ^(D) ⁻¹ possible different values for thecorresponding list of child error values e (i.e., one for each possiblesubset sεS(j)

of retained coefficients at j). Thus, given that we obviously have O(B)choices for the b parameter, it is easy to see that the overall spacerequirements of the exemplary generalized DP array are O(|E|2² ^(D)^(+D) NB). Again, D is a small constant, typically between 2-5. In termsof time complexity, note that the exemplary generalization also allowsfor an O(log B)-time search for the breakup of a node's allotment to itschildren *using binary search in the above recurrence). Thus, theoverall time complexity of the exemplary DP is O(|E|2² ^(n) ^(+D) NB logB).

In summary, the above-described approximation scheme for deterministicmulti-dimensional wavelet thresholding discovers an approximate solutionthat is guaranteed to be within a worst-case additive error of

$ɛ\; {R\left( {{{resp}.},{ɛ\; \frac{R}{S}}} \right)}$

of the optimal (i.e., minimum) maximum absolute (resp., relative) errorin time

$O\left( {\frac{D + {\log \; R} + {\log \; \log \; N}}{ɛ}2^{2^{D} + {2D}}N\; \log^{2}N\; \log \; B} \right)$

and with a total space requirement of

${O\left( {\frac{D + {\log \; R} + {\log \; \log \; N}}{ɛ}2^{{2D} + {2D}}{NB}\; \log^{2}N} \right)}.$

Proof: The goal is to demonstrate that the exemplary approximationscheme produces a solution that is within a small additive error of theoptimal solution. Only the case of absolute error is considered, as theargument for relative error is similar. Let M[j,b,e]=M[(j₁, . . . ,j_(m)),b(e₁, . . . , e_(m))] denote the optimal absolute error valuewhen allowed a coefficient synopsis of size b in the list of errorsubtrees (T_(j) ₁ , . . . , T_(j)), assuming an incoming additive errorof e_(j) _(k) at subtree T_(j) _(k) (due to its discarded ancestorcoefficients), where k=1, . . . , m. Thus, the goal is to upper-boundthe absolute difference

|M^(a)[(root),B,(0)]−M[(root),B,(0)]|.

Let the height of an error-tree node denote its distance from theerror-tree leaf (i.e., data) nodes (i.e., the leaves are at height 0).Induction is used to demonstrate the following claim. Claim: if the treenodes in list j=(j₁, . . . , j_(m)) are at height i, then, for any spaceallotment b and error-value list e,

|M ^(a) [j,b,e]=M[j,b,e]|≦(i+1)·ε2^(D) R log N.

Let N(j) denote the number of nodes in the subtrees rooted at all nodesin j; that is, N(j)=|T_(j) ₁ ∪ . . . ∪T_(j) _(m) |. The proof of theabove claim is based on an inductive argument on N(j). For the basecase, note that if j comprises only a leaf node, then the claim isclearly true by the construction of the exemplary approximate breakpointset E. Now, assume that the claim is true for all lists j′ with N(j′)<n,and let j be a node list such that N(j)=n; also, assume that the nodesin j are at height i in the error tree.

Consider the simpler case where j comprises a single node j, i.e., j=(j)and let (j₁, . . . , j_(m)) denote the list of children of node j.Recall that

$\begin{matrix}{{M^{a}\left\lbrack {(j),{b(e)}} \right\rbrack} = {\min\limits_{s \in {S{(j)}}}\left\{ {M^{a}\left\lbrack {\left( {j_{1},\ldots \mspace{14mu},j_{m}} \right),{b - {s}},\left( {e_{1},\ldots \mspace{14mu},e_{m}} \right)} \right\rbrack} \right\}}} & (4)\end{matrix}$

where e_(k)=(e+Σ_(cεS(j)−s) sign(c, j_(k))·c) for k=1, . . . , m. Now,let e′_(k)=(e+Σ_(cεS(j)−s) sign(c, j_(k))·c) for k=1, . . . , m and itis easy to see that the corresponding optimal error value at node j isdefined similarly as

$\begin{matrix}{{M\left\lbrack {(j),b,(e)} \right\rbrack} = {\min\limits_{s \subseteq {S{(j)}}}{\left\{ {M\left\lbrack {\left( {j_{1},\ldots \mspace{14mu},j_{m}} \right),{b - {s}},\left( {e_{1}^{\prime},\ldots \mspace{14mu},e_{m}^{\prime}} \right)} \right\rbrack} \right\}.}}} & (5)\end{matrix}$

For the sake of brevity, let j_(c)=(j₁, . . . , j_(m)), e=(e₁, . . . ,e_(m)), and e′=(e′₁, . . . , e′_(m)). By the inductive hypothesis, forany choice of the s subset,

|M ^(a) [j _(c) ,b−|s|,e]|−M[j _(c) ,b−|s|,e]≦i·ε2^(D) R log N.  (6)

Now, assume it is wanted to compute the optimal error value for j_(c)with the unrounded incoming additive errors, i.e., M[j_(c),b−|s|,e′].Clearly, an upper bound for this error can be obtained by retainingthose coefficients corresponding to the optimal value with the roundedincoming errors (i.e., b and then adding the rounding error|e_(k)−e′_(k)| to all the leaf nodes in the subtree rooted at node j_(k)(k=1, . . . , m). In other words,

|M[j _(c) ,b−|s|,e]−M[j _(c) ,b−|s|,e′]≦ε2 ^(D) R log N|  (7)

Now, a simple application of the triangle inequality over inequalities(6) and (7) gives

|M ^(a) [j _(c) ,b−|s|,e]|=M[j _(c) ,b−|s|,e′]≦(i+1)·ε2^(D) R log N

which, combined with inequalities (4) and (5) guarantees that|M^(a)[(j),b,(e)]|M[(j),b,(e)]≦(i+1)·ε2 ^(D) R log N. This completes theproof the claim for the case of a single-node list j=(j).

For the case of a multi-node list j=(j₁, . . . , j_(m)), where l>1,recall that

${{M^{a}\left\lbrack {j,b,e} \right\rbrack} = {\min\limits_{0 \leq b^{\prime} \leq b}{\max \begin{Bmatrix}{{M^{a}\left\lbrack {\left( j_{1} \right),b^{\prime},\left( e_{1} \right)} \right\rbrack},} \\{M^{a}\left\lbrack {\left( {j_{2},\ldots \mspace{14mu},j_{i}} \right),{b - b^{\prime}},\left( {e_{2},\ldots \mspace{14mu},e_{1}} \right)} \right\rbrack}\end{Bmatrix}}}},$

with the exact same relation holding for the optimal error values (usingM[] instead of M^(a)[] above). Thus, once again, applying the inductivehypothesis and the properties of the exemplary error-rounding procedureto the right-hand side of the above expression, gives|M^(a)[j,b,e]−M[j,b,e]|≦(i+1)·ε2 ^(D) R log N. Thus, the claim holds forthis case as well.

Now, a simple application of the claim with j=(root) gives

|M ^(a)[(root),B,(0)]−M[(root),B,(0)]|≦(1+log N)ε2^(D) R log N=O(ε2^(D)R log² N).

Thus, simply setting

$ɛ^{\prime} = \frac{ɛ}{O\left( {2^{D}\log^{2}N} \right)}$

gives a worst-case additive deviation of εR from the optimal maximumabsolute error value. The running-time and space complexity of theexemplary approximation scheme for the above value of ε′ followimmediately from the earlier discussion.

The bounds represent a truly pathological worst-case scenario for thescheme, where all coefficients on a root-to-leaf path are of maximumabsolute value R. In real-life applications, most of the energy of adata signal (i.e., array) is typically concentrated in a few largewavelet coefficients, which implies that most coefficient values in theerror tree will be (much) smaller than R. Thus, for real-life practicalscenarios, the approximation scheme would be much closer to the optimalsolution than the worst-case deviation bounds.

FIG. 5 shows input and output for an exemplary embodiment of anε-additive error approximation scheme for absolute/relative errorminimization for deterministic multi-dimensional wavelet thresholding(AddMultIDErr) 500. Inputs include the wavelet array (W_(A)) 502, anumber of retained coefficients (B) 504, a list of subtree-root indexes(j₁, . . . , j_(m)) 506, and a list of incoming (rounded) error values(e₁, . . . , e_(m)) 508. Output includes array M[(j₁, . . . , j_(m)),B,(e₁, . . . , e_(m))], which is the approximate minimum maximum relativeerror for all data values in the subtrees rooted at (j₁, . . . , j_(m))for synopsis space B, assuming corresponding incoming error values (e₁,. . . , e_(m)). The notation includes round_(ε)(v) which is the roundvalue down to the closest power of (1+ε), i.e., ±(1+ε)^(k).

FIGS. 6A and 6B are flowcharts describing the exemplary embodiment ofthe scheme AddMultIDErr of FIG. 5. In this example, procedureAddMultIDErr starts at 600. It is determined whether array M[(j₁, . . ., j_(m)),B,(e₁, . . . , e_(m))] is already computed at 602. If so, thenthe resulting array is returned at 604; otherwise, it is determinedwhether m=j and j₁ is a leaf node at 606. If so, then the array M[j₁, B,e₁] is computed using a formula if b equals zero, otherwise the array isset to undefined and the array is returned at 608. It is determinedwhether m=1 at 610. If so, then control flows to 612; otherwise to 614.At 612, computations are performed and array M[(j₁),B,(e₁)]=OPT isreturned. At 614, other computations are performed and array M[(j₁, . .. , j_(m)),B,(e₁, . . . , e_(m))]=OPT is returned.

Approximation Scheme for Absolute Error Minimization

For the special case of minimizing absolute error for deterministicmulti-dimensional wavelet thresholding, there is an exemplary embodimentof a polynomial-time (1+ε)-approximation scheme. Assume that all waveletcoefficients are integers, which can always be satisfied byappropriately scaling the coefficient values. For example, for integerdata values d_(i) (e.g., a multi-dimensional frequency count array),scaling by a factor of O(2^(D log N))=O(N^(D)) is always guaranteed togive integer Haar coefficients. Let R_(Z) denote the maximum (scaled)coefficient value in the error tree; as previously; it is easy to seethat the additive (integer) contribution to the absolute reconstructionerror from any possible path in the Haar error tree is guaranteed to liein the integer range R_(Z)=[−E_(max) log N,+E_(max) log N], where, ofcourse, E_(max)≦R_(Z)2^(D). This observation directly leads to anoptimal pseudo-polynomial time algorithm for the maximum absolute-errorminimization problem. In fact, this pseudo-polynomial time schemedirectly extends to maximum relative-error minimization as well. Thecoefficients are intelligently scaled-down for the (1+ε)-approximationscheme for absolute error so that the possible range of integeradditive-error values entering a subtree is polynomially-bounded.

Briefly, the optimal pseudo-polynomial time scheme is again based ondynamic programming over the error tree of integer coefficient valuesand follows along similar lines as the additive-error approximationalgorithm, but without doing any rounding of incoming error values. Thealgorithm starts by precomputing an error vector E(j,s) for eachinternal node j in the error tree and each subset s of the coefficientvalues in S(j). The E(j,s) vector basically stores the total additiveerror propagated to each of the children of node j if it is decided todrop the coefficients in s⊂S(j) from the synopsis; thus, the E(j,s)vector is indexed by the children of node j, and, for the i^(th) childnode j_(i) of j, define the corresponding i^(th) error-vector entry asE(j,s)=Σ_(cεs) sign(c, j_(i))·c where sign(c,j_(i)) is the sign of c'scontribution to the j_(i) child subtree. It is easy to see thatcomputing these error vectors E(j,s) for every j and every s⊂S(j) takesO(N2² ^(D) ⁻¹2^(D)=O(N2² ^(D) ^(+D)) time. Also, note that the E_(max)boundary in the integer error range R_(Z) can now be defined moreprecisely as E_(max)=max_(j,s,i){E(j,s))[i]}≦R_(Z)2^(D).

The exemplary pseudo-polynomial DP scheme then constructs a tableM[j,b,e] for every node j, space budget b≦B and error value e,eεR_(Z),where M[j,b,e] denotes the minimum possible maximum absolute error inthe T_(j) subtree, assuming a space budget of b coefficients in T_(j)and an error contribution of e due to ancestors of node j. As earlier,we define M[j,0,e]=|e| for a leaf/data node j, whereas for an internalnode j (with children j₁, . . . , j_(m)) and assuming that a subsets⊂S(j) of coefficients is dropped from the synopsis, compute M[j,b,e] as

$\min\limits_{0 \leq {b_{1} + \ldots + b_{m}} \leq {b - {({2^{D} - 1 - {s}})}}}{\max {\left\{ {M\left\lbrack {j_{i},b_{i},{{{E\left( {j,s} \right)}\lbrack i\rbrack} + e}} \right\rbrack} \right\}.}}$

Thus, the final (optimal) value of M[j,b,e] is computed by dropping thecoefficient subset s⊂S(j) that minimizes the above term. Once again, theO(B² ^(D) ) factor needed to cycle through all the b₁, . . . , b_(m)allotments can be avoided using the generalization described earlier;thus, there is an optimal DP algorithm that runs in time O(E_(max)2^(2D+D) N log NB log B). An observation is that if E_(max) ispolynomially bounded, then the exemplary DP scheme described above isalso a polynomial-time algorithm. This idea is used to devise apolynomial-time (1+ε)-approximation scheme.

Given a threshold parameter τ>0, define a truncated DP algorithm asfollows. Let V_(≦τ) denote the set of error vectors E(j,s) for which theabsolute value of each of their entries E(j,s)[i] is at most τ_(i); thatis, V_(≦τ)={E(j,s):|E(j,s)[i]|≦τ for all i}. Similarly, let V_(>τ)denote the set of error vectors E(j,s)[i] not in V_(≦τ). Finally, defineK_(τ) as the quantity

$K_{\tau} = {\frac{ɛ\tau}{\log \; N}.}$

The truncated DP algorithm replaces each error vector E(j,s) in theerror tree with a scaled-down error vector

${E^{\tau}\left( {j,s} \right)} = {\left\lbrack \frac{E\left( {j,s} \right)}{K_{\tau}} \right\rbrack \left( {{i.e.},{{{E^{\tau}\left( {j,s} \right)}\lbrack i\rbrack} = \left\lbrack \frac{{E\left( {j,s} \right)}\lbrack i\rbrack}{K_{\tau}} \right\rbrack}} \right.}$

for all i) and works with these scaled (integer) error vectors.Furthermore, at any node j, the algorithm only considers droppingsubsets s a scaled-down error vector for S(j) such that E(j,s)εV_(≦τ)from the synopsis. More formally, build a DP array M^(τ)[j,b,e] usingthe scaled error vectors as follows. As previously, for a leaf node j,define M^(τ)[j,b,e]=|e|. For an internal node j, the algorithm cyclesthrough only those subsets sεS(j) such that E(j,s)εV_(≦τ). TheM^(τ)[j,b,e] entry is undefined if b<2^(D)−1−|s| for all s such thatE(j,s)εV_(τ). The DP recurrence for computing M^(τ)[j,b,e] is identicalto the one for the exemplary pseudo-polynomial scheme described above,except for the fact that E(j,s) is replaced by its scaled versionE^(τ)(i,S).

It is claimed that the above truncated DP is a polynomial-time algorithmfor any value of the threshold parameter τ. Indeed, since at each nodej, it is considered dropping coefficient subsets s with error vectorsE(j,s) in V_(≦τ), the absolute additive error propagated to each childof j is guaranteed to be at most

${{\max_{i}{{{E^{\tau}\left( {j,s} \right)}\lbrack i\rbrack}}} \leq \frac{\tau}{K_{\tau}}} = {\frac{\log \; N}{ɛ}.}$

This, of course, implies that the absolute additive error that can enterany subtree in the exemplary truncated DP algorithm is at most log² N/ε;in other words, the range of possible (integer) incoming error values efor the truncated DP array M^(τ)[] is guaranteed to be only

$R_{}^{\tau} = {\left\lbrack {{{- \frac{1}{ɛ}}\log^{2}N},{{+ \frac{1}{ɛ}}\log^{2}N}} \right\rbrack.}$

Thus, based on the earlier analysis, the running time of the truncatedDP algorithm for a fixed parameter τ is only

${O\left( {\frac{1}{ɛ}2^{2^{D} + D}N\; \log^{2}{NB}\; \log \; B} \right)}.$

Given a threshold parameter τ, the exemplary truncated DP algorithmselects a subset C_(τ) of coefficients to retain in the synopsis. Theabsolute-error approximation scheme employs the truncated DP algorithmfor each value τε{2 ^(k):k=0, . . . , [log(R_(Z)2^(D))]}, and finallyselects the synopsis C_(τ) that minimizes the maximum absolute error inthe data-value reconstruction. Clearly, since we only try O(D+log R_(Z))different values for τ, the running time of the approximation algorithmremains polynomial.

The above-described scheme gives a (1+ε)-approximation algorithm formaximum absolute error minimization. Consider the optimal maximum errorsynopsis C_(OPT) and let absErr(C_(OPT)) denote the correspondingmaximum absolute error value. Also, for each internal node j in theerror tree, let s*_(j) ⊂S(j) denote the subset of coefficient droppedform the optimal synopsis C_(OPT) at node j. Finally, let C denote themaximum absolute value across all entries in the collection of errorvectors E(j, s*_(j)) for all j; that is, C=max_(j,i){E(j,s*_(j))[i]}.Clearly, the approximation algorithm is going to try a thresholdparameter, say τ′, such that. The goal is to show that the maximumabsolute error achieved by C_(τ′) (i.e., absErr(C_(τ═))) is very closeto that achieved by the optimal solution C_(OPT).

First, note that, by the definition of C and τ′, the optimal solutionmay drop a subset of coefficients s⊂S(j) at node j only ifE(j,s)εV_(≦τ′). Thus, C_(OPT) is a feasible solution to the truncated DPinstance (with threshold=τ′).

Now let absErr(C_(τ′)), absErr(C_(OPT)) denote the maximum absoluteerrors in the K_(τ′)-scaled instance for the C_(τ′) synopsis (obtainedby the truncated DP scheme) and the optimal C_(OPT) synopsis,respectively. Given the optimality of the truncated DP for the scaledinstance,

absErr_(τ′)(C _(τ′))≦absErr_(τ′)(C _(OPT))  (8)

Let C be any wavelet synopsis (i.e., subset of Haar coefficients). In aK_(τ′)-scaled instance, any error-vector value for C is represented by

${{E^{\tau^{\prime}}\left( {j,s} \right)}\lbrack i\rbrack} = \left\lbrack \frac{{E\left( {j,s} \right)}\lbrack i\rbrack}{K_{\tau^{\prime}}} \right\rbrack$

which differs by at most 1 from

$\frac{{E\left( {J,s} \right)}\lbrack i\rbrack}{K_{\tau^{\prime}}};$

thus, it is easy to see that the scaled and non-scaled maximum absoluteerrors for the C synopsis are related as follows

absErr(C)ε(K_(τ′)absErr_(τ′)(C)±K_(τ′) log N).  (9)

Applying the above formula for C=C_(OPT) and combining with equation(8),

absErr(C_(OPT)) ≥ K_(τ^(′))absErr_(τ^(′))(C_(OPT)) − K_(τ^(′))log  N ≥ K_(τ^(′))absErr_(τ^(′))(C_(τ^(′))) − K_(τ^(′))log  N,

and using equation (9) once again with C=C_(OPT),

absErr(C _(τ′))≦K _(τ′)absErr_(τ′)(C _(τ′))+K _(τ′) log N

Now, simply combining the last two formulas and substituting

$\begin{matrix}{{K_{\tau^{\prime}} = \frac{{ɛ\tau}^{\prime}}{\log \; N}},{{{absErr}\left( C_{\tau^{\prime}} \right)} \leq {{{absErr}\left( C_{OPT} \right)} + {2K_{\tau^{\prime}}\log \; N}} \leq {{{absErr}\left( C_{OPT} \right)} + {2{{ɛ\tau}^{\prime}.}}}}} & (10)\end{matrix}$

The goal now is to demonstrate that absErr(C_(OPT)) is at least Ω(τ′).The proof relies on the following ancillary claim. Claim: Let C be anyHaar-coefficient synopsis and, for any internal node j, let e_(j) ^(C)denote the absolute value of the additive error coming into the T_(j)subtree due to ancestors of node j dropped from the C synopsis. Then,absErr(C)≧e_(j) ^(C).

Proof: Making use of the following observation, let T denote the(multidimensional) Haar error tree (FIG. 2) where all coefficient valuesare retained (to allow for exact data reconstruction). Given an internalnode j, let v(T_(j)) denote the total additive contribution fromancestors of node j during the data-reconstruction process fro the datavalues at the leaves of the T_(j) subtree (rooted at j). Clearly, thisadditive contribution v(T_(j)) is the same for all leaves of T_(j), and,letting path(j) denote the set of all proper ancestor nodes of j in T,v(T_(j)) is simply computed as v(T_(j))=τ_(c) _(i)_(εpath (j))sign_(i,j)·c_(i) where sign_(i,j) is either +1 or −1depending on how the c_(i) coefficient affects the data values in theT_(j) subtree. One of the basic properties of the Haar-waveletdecomposition is that v(T_(j)) is exactly the average of all the datavalues at the leaf nodes of the T_(j) subtree. This is quite intuitive,as the Haar-wavelet coefficients are essentially trying to summarize thedata at successively coarser levels of resolution.

Now, let T^(C) denote the error tree where only the coefficients in Care retained (with all other coefficients set to 0). Note that, by thedefinition of e_(j) ^(C), e_(j) ^(C)=|v(T_(j))−v(T_(j) ^(C)|. Also, letj1, . . . , j_(m) be the children of node j in T and T^(C). Theobservation above directly implies that v(T_(j)) is the average of thechild-subtree values v(t_(j)), i.e.,

${v\left( T_{j} \right)} = {\frac{1}{m}{\sum\limits_{i = 1}^{m}{v\left( T_{j} \right)}}}$

(and, of course, the same also holds for the corresponding approximatevalues v(T_(j) ^(C)) and v(T_(j) ₁ ^(C)), . . . v(T_(j) _(m) ^(C))).Consider the absolute incoming additive errors e_(j) _(i) ^(C) at childnodes j_(i) of j. The discussion earlier implies the followinginequality:

$e_{j}^{C} = {{{{v\left( T_{j} \right)} - {v\left( T_{j}^{C} \right)}}} = {{\frac{1}{m}{{{\sum\limits_{i = 1}^{m}{v\left( T_{j_{i}} \right)}} - {\sum\limits_{i = 1}^{m}{v\left( T_{j_{i}}^{C} \right)}}}}} \leq {\frac{1}{m}{\sum\limits_{i = 1}^{m}{e_{j_{i}}^{C}.}}}}}$

Thus, there must exist at least one child node, say j_(k), of j suchthat e_(j) _(k) ^(C)≧e_(j) ^(C). Continuing this argument, a path isobtained from node j to a leaf/data value dl in the T_(j) ^(C) subtreesuch that the absolute additive error entering that leaf, i.e.,|{circumflex over (d)}−d₁|, is at least e_(j) ^(C). But, then,absErr(C)≧|{circumflex over (d)}_(l)−d_(l)|≦e_(j) ^(C). This completesthe proof for the claim.

Lemma: Let C_(OPT) be the optimal maximum absolute error synopsis andlet τ′ be as defined above. Then,

${{absErr}\left( C_{OPT} \right)} > {\frac{\tau}{4}.}$

Proof: By the definition of τ′, there exists an error-tree node j suchthat the subset of dropped coefficients s*_(j) ⊂S(j) for j in theoptimal solution C_(OPT) satisfies

${{{E\left( {j,s_{j}^{*}} \right)}\lbrack k\rbrack}} \geq \frac{\tau^{\prime}}{2}$

for some child j_(k) of j. Now consider two possible cases for e_(j)^(C) ^(OPT) , the absolute additive error entering the subtree rooted atnode j in the optimal solution:

$\begin{matrix}{{{{If}\mspace{14mu} e_{j}^{C_{OPT}}} \geq \frac{\tau^{\prime}}{4}},} & (1)\end{matrix}$

then, by the above claims,

${{absErr}\left( C_{OPT} \right)} \geq {\frac{\tau^{\prime}}{4}.\begin{matrix}{{{{If}\mspace{14mu} e_{j}^{C_{OPT}}} < \frac{\tau^{\prime}}{4}},} & (2)\end{matrix}}$

then the absolute value additive of the additive error entering thesubtree rooted at child j_(k) of j is at least

${{\frac{\tau^{\prime}}{2} - \frac{\tau^{\prime}}{4}} = \frac{\tau^{\prime}}{4}};$

thus, the claims again imply that

${{absErr}\left( C_{OPT} \right)} \geq {\frac{\tau^{\prime}}{4}.}$

This completes the proof.

Combining inequality (10) with the lemma,

absErr(C _(τ′))≧(1+8ε)absErr(C _(OPT)).

Thus, simply setting ε′=ε/8, there is a (1+ε)-approximation scheme formaximum absolute error minimization in multiple dimensions.

In summary, there is an exemplary approximation scheme for deterministicmulti-dimensional wavelet thresholding that discovers an approximatesolution that is guaranteed to be within (1+ε) of the optimal (i.e.,minimum) maximum absolute error in time

$O\left( {\frac{\log \; R_{\mathbb{Z}}}{2^{ɛ}}2^{2^{D_{+ D}}}N\; \log^{2}{NB}\; \log \; B} \right)$

and with a total space requirement of

${O\left( {\frac{1}{ɛ}2^{D}N\; \log^{2}{NB}} \right)}.$

Extension to General Error Metrics

Exemplary embodiments described above include the minimization ofmaximum-error metrics (like, maximum relative error) in the approximatedata-value reconstruction. However, exemplary embodiments of the presentinvention have much more general applicability. Once again, this is insharp contrast with prior art thresholding schemes that can handle onlymaximum-error metrics. Consider the natural class of distributive errormetrics defined formally below.

Distributive Error Metrics

Consider an approximation of a (one- or multi-dimensional) data array A,and let f(R) denote the error in the data-value approximation over the(one- or multi-dimensional) range of values R in A. We say that theerror metric f( ) is distributive if and only if, for any collection ofdisjoint ranges R₁ . . . , R_(k), there exists some combining functiong( ) such that the error over the entire region ∪_(i=1) ^(k)R_(i) can bedefined as

f(∪_(i=1) ^(k) R _(i))=g(f(R ₁), . . . , f(R _(k)))

The above-defined class of distributive error metrics encompassesseveral important approximation-error functions. For instance, themaximum-error metrics defined above are clearly distributive (with thecombining function g( ) being simply the max{ } of its input arguments).Furthermore, it is not hard to see that most important cumulativeapproximation-error metrics are also naturally distributive, includingthe mean relative error

$\frac{1}{N}{\sum_{i}{relErr}_{i}}$

and the L_(p-norm) error

$\left\lbrack {\frac{1}{N}{\sum_{i}{{d_{i} - d_{i}}}^{p}}} \right\rbrack^{\frac{1}{p}}$

for any p≧0) in the data reconstruction, as well as the weightedvariants of these metrics

$\left( {{\frac{1}{N}{\sum_{i}{{w_{i} \cdot {relErr}_{i}}\mspace{14mu} {{and}\mspace{14mu}\left\lbrack {\frac{1}{N}{\sum_{i}{w_{i} \cdot {{{\hat{d}}_{i} - d_{i}}}^{p}}}} \right\rbrack}^{\frac{1}{p}}}}},} \right.$

respectively), where different weights w_(i) (typically normalized, suchthat τ_(i)w_(i)=1) are associated with the errors for different valuesin the underlying data domain. Such weights are an important tool forcapturing the importance of individual data values, for example, basedon the non-uniformities of the observed query workload.)

An observation is that the exemplary optimal DP formulation can, infact, be easily adapted to work with any distributive error metric.Given such an error metric f( ) the basic idea is to define the DP arrayM[j,b,S] as well as the DP recurrences for computing its entries usingthe general form of the f( ) metric and the corresponding combiningfunction g( ) that allows us to “distribute” the computation of f( )over sub-ranges of the data domain l. More specifically, define the basecase for the exemplary DP recurrence (i.e., for leaf nodesc_(j)=d_(j−N), j≧N) as M[j,0,S]=f({d_(j−N)}|S) for each subsetS⊂path(dj−N), where f({d_(j−N)}|S) denotes the value of the f( ) errormetric at data value d_(j−N) assuming the coefficients in S are kept inthe synopsis. Now, in the case of an internal note c_(j) with j<N,define the optimal error M[j,b,S] when coefficient c_(j) is eitherdropped from or kept in the synopsis in a manner similar to equations(2)-(3) with the difference that max{ } is now replaced by the combiningfunction go for the distributive error metric f( ). More formally, thegeneral DP recurrence for f( ) simply definesM[j,b,S]=min{M_(drop)[j,b,S],M_(keep)[j,b,S]}, where

${{M_{drop}\left\lbrack {j,b,S} \right\rbrack} = {\begin{matrix}\min \\{0 \leq b^{\prime} \leq b}\end{matrix}{g\left( {{M\left\lbrack {{2j},b^{\prime},S} \right\rbrack},{M\left\lbrack {{{2j} + 1},{b - b^{\prime}},S} \right\rbrack}} \right)}}},{and}$${M_{keep}\left\lbrack {j,b,S} \right\rbrack} = {\begin{matrix}\min \\{0 \leq b^{\prime} \leq {b - 1}}\end{matrix}{{g\begin{pmatrix}{{M\left\lbrack {{2j},b^{\prime},{S\bigcup\left\{ c_{j} \right\}}} \right\rbrack},} \\{M\begin{bmatrix}{{{2j} + 1},{b - b^{\prime}},} \\{S\bigcup\left\{ c_{j} \right\}}\end{bmatrix}}\end{pmatrix}}.}}$

For example, as a more concrete problem setting, consider the adaptationof our optimal DP formulation for the case of the mean weighted relativeerror metric

$\frac{1}{N\;}{\sum_{i}{w_{i} \cdot {{relErr}_{i}.}}}$

Since the averaging factor N is basically constant (i.e., the size ofthe data domain), it is obviously sufficient to optimize the cumulativeweighted relative error; that is, we seek to minimize

${{f\left( \left\{ {d_{1},\ldots \mspace{14mu},d_{N}} \right\} \right)} = {{\sum\limits_{i}{w_{i} \cdot {relErr}_{i}}} = {\sum\limits_{i}\frac{\left. {w_{i} \cdot} \middle| {d_{i} - d_{i}} \right.}{\max \left\{ {{d_{i}}s} \right\}}}}},$

(for a given sanity bound value s).

For the base case of data (i.e., leaf) nodes c_(j)=d_(j)−N with j N, wedefine M[j,0,S] (for each subset S⊂path (d_(j−N))) as the weightedrelative error at value d_(j−N) (assuming the coefficients in S areretained in the synopsis); that is,

${M\left\lbrack {j,0,S} \right\rbrack} = {\left. {f\left( \left\{ d_{j - N} \right\} \right)} \middle| S \right. = \frac{w_{j - N} \cdot {{d_{j - N} - {\sum\limits_{c_{k} \in S}{{sign}_{{j - N},k} \cdot c_{k}}}}}}{\max \left\{ {{d_{j - N}},s} \right\}}}$

For the case of internal nodes with j<N, note that the combiningfunction g( ) for our cumulative weighted relative error metric issimple summation; thus, we defineM[j,b,S]=min{M_(drop)[j,b,S],M_(keep)[j,b,S]}, where

${{M_{drop}\left\lbrack {j,b,S} \right\rbrack} = {\begin{matrix}\min \\{0 \leq b^{\prime} \leq b}\end{matrix}\left\{ {{M\left\lbrack {{2j},b^{\prime},S} \right\rbrack} + {M\left\lbrack {{{2j} + 1},{b - b^{\prime}},S} \right\rbrack}} \right\}}},{and}$${M_{keep}\left\lbrack {j,b,S} \right\rbrack} = {\begin{matrix}\min \\{0 \leq {{b^{\prime}b} - 1}}\end{matrix}{\begin{Bmatrix}{{M\left\lbrack {{2j},b^{\prime},{S\bigcup\left\{ c_{j} \right\}}} \right\rbrack} +} \\{M\left\lbrack {{{2j} + 1},{b - b^{\prime} - 1},{S\bigcup\left\{ c_{j} \right\}}} \right\rbrack}\end{Bmatrix}.}}$

Since the exemplary optimal DP formulation extends directly tomulti-dimensional data arrays and wavelets, the above-describedgeneralizations are directly applicable to multiple dimensions as well;that is, of course, modulo the super-exponential explosion in space/timecomplexity with increasing dimensionality. Furthermore, note that boththe exemplary efficient approximation schemes for near-optimalthresholding in multiple dimensions are essentially based on sparserversions of this optimal multi-dimensional dynamic program. Thus, theexemplary maximum-error approximation algorithms can also be adapted towork with other distributive error metrics as well with similarapproximation guarantees; for instance, the techniques and analysesabove can easily be extended to (approximately) optimize for thecorresponding cumulative error metrics (i.e., mean absolute and meanrelative error).

FIG. 7 shows input and output for an exemplary embodiment of a (1+ε)approximation scheme for absolute error minimization for deterministicmulti-dimensional wavelet thresholding (ApproxMultIDErr) 700. Inputs toApproxMultIDErr 700 include a wavelet array (W_(A)) 702 and a number ofretained coefficients (B) 704. Outputs include the ε-approximateabsolute error synopsis of wavelet coefficients 706. Some notationincludes R_(Z)=maximum absolute coefficient value, N=maximum dimensionsize, and S(j)=set of coefficients at node j.

FIGS. 8A-8C are flowcharts describing the exemplary embodiment of thescheme ApproxMultIDErr of FIG. 7. In this example, procedureApproxMultIDErr starts at 800. An error vector is computed for each nodeat 802. Initializations are performed at 804 and the top of a loopstarts at 806, 808, where τ=2^(k). Then, V, V_(≦τ), and V_(>τ)=V−V_(≦τ)are computed at 810. Next, K_(r) is computed and the r-scaled-down errorvectors E^(τ)(j,s)_(are) defined at 812. Using dynamic programming overthe r-scaled down error vectors, the array M^(τ)[j,b,e] is computed,which is the minimum maximum absolute error at node j with bcoefficients and incoming error of e and the resulting synopsis isdefined at 814. It is determined whether M^(τ)[root,B,0] is less thanthe best error at 816. If so, the best error is set to M^(τ)[root,B,0]and the best synopsis is set to C_(τ); otherwise, the end of the looptest is performed at 820. If it is not the end of the loop, the loopindex is incremented at 822 and control flow back via branch 806 to 808;otherwise, the best synopsis is returned at 824.

Wavelets have a long history of successes in the signal and imageprocessing arena and, recently, they have also found their way intodata-management applications. Haar-wavelet coefficients have been usedas synopses for accurately estimating the selectivities of rangequeries. I/O-efficient algorithms for building multi-dimensional Haarwavelets from large relational data sets and showing that a small set ofwavelet coefficients can efficiently provide accurate approximateanswers to range aggregates over OLAP cubes. Haar wavelets are used as ageneral-purpose approximate query processing tool by designing efficientalgorithms that can process complex relational queries (with joins,selections, etc.) entirely in the wavelet-coefficient domain. Theproblem of on-line maintenance for coefficient synopses can be solvedwith a probabilistic-counting technique that approximately maintains thelargest normalized-value coefficients in the presence of updates.Algorithms exist for building approximate one-dimensional Haar-waveletsynopses over numeric data streams. There are time- and space-efficienttechniques for constructing Haar-wavelet synopses for data sets withmultiple measure (such as those typically found in OLAP applications).

All of the above applications rely on conventional L₂-error-basedthresholding schemes that typically decide the significance of acoefficient based on this absolute normalized value. Such conventionalwavelet synopses can suffer from several problems, including theintroduction of severe bias in the data reconstruction and wide variancein the quality of the data approximation, as well as a lack ofnon-trivial guarantees for individual approximate answers. In contrast,their proposed probabilistic wavelet synopses rely on a probabilisticthresholding process based on randomized rounding that tries toprobabilistically control the maximum relative error in the synopsis byminimizing appropriate probabilistic metrics, like normalized standarderror or normalized bias. The problem addressed by the presentinvention, namely the design of efficient deterministic thresholdingschemes for maximum error metrics, has not been addressed by the priorart.

There is a rich mathematical literature on m-term approximations usingwavelets, where m is the number of coefficients in the synopsis. Someprior work has studied thresholding approaches for meeting a targetupper bound for an L_(p)-error metric. However, the prior art does notaddress the deterministic minimization of relative errors with sanitybounds, which is an important scenario for approximate query processingin databases and the present invention providescomputationally-efficient (optimal and near-optimal) deterministicthresholding schemes for minimizing maximum-error metrics for one- andmulti-dimensional wavelet summaries.

Embodiments of the present invention include novel,computationally-efficient schemes for deterministic maximum-errorwavelet thresholding in one and multiple dimensions. For one-dimensionalwavelets, an optimal, low polynomial-time thresholding algorithm basedon a new dynamic-programming formulation minimizes either the maximumrelative error or the maximum absolute error in the data approximation.For the multi-dimensional case, embodiments include novel,polynomial-time approximation schemes with tunable ε-approximationguarantees for the target metric for maximum-error thresholding based onapproximate dynamic programs.

FIG. 9 is a high level block diagram showing a computer. The computer900 may be employed to implement embodiments of the present invention.The computer 900 comprises a processor 930 as well as memory 940 forstoring various programs 944 and data 946. The memory 940 may also storean operating system 942 supporting the programs 944.

The processor 930 cooperates with conventional support circuitry such aspower supplies, clock circuits, cache memory and the like as well ascircuits that assist in executing the software routines stored in thememory 940. As such, it is contemplated that some of the steps discussedherein as software methods may be implemented within hardware, forexample, as circuitry that cooperates with the processor 930 to performvarious method steps. The computer 900 also contains input/output (I/O)circuitry that forms an interface between the various functionalelements communicating with the computer 900.

Although the computer 900 is depicted as a general purpose computer thatis programmed to perform various functions in accordance with thepresent invention, the invention can be implemented in hardware as, forexample, an application specific integrated circuit (ASIC) or fieldprogrammable gate array (FPGA). As such, the process steps describedherein are intended to be broadly interpreted as being equivalentlyperformed by software, hardware, or a combination thereof.

The present invention may be implemented as a computer program productwherein computer instructions, when processed by a computer, adapt theoperation of the computer such that the methods and/or techniques of thepresent invention are invoked or otherwise provided. Instructions forinvoking the inventive methods may be stored in fixed or removablemedia, transmitted via a data stream in a broadcast media or othersignal bearing medium, and/or stored within a working memory within acomputing device operating according to the instructions.

While the foregoing is directed to various embodiments of the presentinvention, other and further embodiments of the invention may be devisedwithout departing from the basic scope thereof. As such, the appropriatescope of the invention is to be determined according to the claims,which follow.

1. A method for deterministic wavelet thresholding for optimizinggeneral-error metrics, comprising: conditioning a minimum error valuefor an error subtree not only on a root node of the error subtree andthe amount of synopsis storage allotted, but also on an error thatenters the subtree through coefficient selections made on a path fromthe root node to a particular node, excluding that particular nodeitself; and tabulating all possible coefficient selections in polynomialtime.
 2. The method of claim 6, further comprising: minimizing error ina data approximation; wherein the minimum error value is maximumrelative error.
 3. The method of claim 6, further comprising: minimizingerror in a data approximation; wherein the minimum error value ismaximum absolute error.
 4. A method for deterministic waveletthresholding for optimizing general-error metrics, comprising:approximately covering a range of all possible error contributions forpaths up to a root node of an error subtree using a plurality ofapproximate error values; tabulating the approximate error values bycapturing approximate error in a plurality of error subtrees, each errorsubtree being rooted at a particular node; and tabulating a roundedadditive error contribution due to ancestors of the particular nodebeing discarded from a synopsis.
 5. The method of claim 9, wherein theerror values are relative.
 6. The method of claim 9, wherein the errorvalues are absolute.
 7. A method for deterministic wavelet thresholdingfor optimizing general-error metrics, comprising: determining errorvalue vectors for each of a plurality of coefficient subsets;scaling-down the error value vectors using a threshold value; usingdynamic programming over the scaled error value vectors, cycling onlythrough coefficient subsets whose error value vector entries are abovethe threshold value to determine an optimal solution for a scaledproblem instance; trying an appropriate collection of additionalthreshold values and returning the synopsis with the smallest error.